- Home
- Search Results
- Page 1 of 1
Search for: All records
-
Total Resources4
- Resource Type
-
0002100001000000
- More
- Availability
-
40
- Author / Contributor
- Filter by Author / Creator
-
-
Zhao, Geng (4)
-
Jiao, Jiantao (3)
-
Hu, Zhanhao (2)
-
Piet, Julien (2)
-
Wagner, David (2)
-
Cao, Taifeng (1)
-
Jordan, Michael (1)
-
Liaw, Peter K. (1)
-
Qiao, Junwei (1)
-
Wang, Shaohua (1)
-
Wu, Xinlong (1)
-
Zhu, Banghua (1)
-
#Tyler Phillips, Kenneth E. (0)
-
#Willis, Ciara (0)
-
& Abreu-Ramos, E. D. (0)
-
& *Soto, E. (0)
-
& Abramson, C. I. (0)
-
& Abreu-Ramos, E. D. (0)
-
& Adams, S.G. (0)
-
& Ahmed, K. (0)
-
- Filter by Editor
-
-
& Spizer, S. M. (0)
-
& . Spizer, S. (0)
-
& Ahn, J. (0)
-
& Bateiha, S. (0)
-
& Bosch, N. (0)
-
& Brennan K. (0)
-
& Brennan, K. (0)
-
& Chen, B. (0)
-
& Chen, Bodong (0)
-
& Drown, S. (0)
-
& Ferretti, F. (0)
-
& Higgins, A. (0)
-
& J. Peters (0)
-
& Kali, Y. (0)
-
& Ruiz-Arias, P.M. (0)
-
& S. Spitzer (0)
-
& Sahin. I. (0)
-
& Spitzer, S. (0)
-
& Spitzer, S.M. (0)
-
(submitted - in Review for IEEE ICASSP-2024) (0)
-
-
Have feedback or suggestions for a way to improve these results?
!
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Current LLMs are generally aligned to follow safety requirements and tend to refuse toxic prompts. However, LLMs can fail to refuse toxic prompts or be overcautious and refuse benign examples. In addition, state-of-the-art toxicity detectors have low TPRs at low FPR, incurring high costs in real-world applications where toxic examples are rare. In this paper, we introduce Moderation Using LLM Introspection (MULI), which detects toxic prompts using the information extracted directly from LLMs themselves. We found we can distinguish between benign and toxic prompts from the distribution of the first response token’s logits. Using this idea, we build a robust detector of toxic prompts using a sparse logistic regression model on the first response token logits. Our scheme outperforms SOTA detectors under multiple metrics.more » « less
-
Hu, Zhanhao; Piet, Julien; Zhao, Geng; Jiao, Jiantao; Wagner, David (, Advances in Neural Information Processing Systems)
-
Zhao, Geng; Zhu, Banghua; Jiao, Jiantao; Jordan, Michael (, Proceedings of Machine Learning Research)We study the problem of online learning in a two-player decentralized cooperative Stackelberg game. In each round, the leader first takes an action, followed by the follower who takes their action after observing the leader’s move. The goal of the leader is to learn to minimize the cumulative regret based on the history of interactions. Differing from the traditional formulation of repeated Stackelberg games, we assume the follower is omniscient, with full knowledge of the true reward, and that they always best-respond to the leader’s actions. We analyze the sample complexity of regret minimization in this repeated Stackelberg game. We show that depending on the reward structure, the existence of the omniscient follower may change the sample complexity drastically, from constant to exponential, even for linear cooperative Stackelberg games.more » « less
-
Cao, Taifeng; Wang, Shaohua; Zhao, Geng; Wu, Xinlong; Liaw, Peter K.; Qiao, Junwei (, Journal of Alloys and Compounds)
An official website of the United States government

Full Text Available